Pooling the polls to estimate bias in past US House Elections

Introduction

In this analysis, I estimate the bias for each public pollster active in the last 6 congressional elections. My final estimate identifies Gallup(Low-Turnout) as the most conservative pollster and AP/Ipsos as the most liberal. I likewise estimate bias of various sampling universes. Next, I use these biases to estimate the true level of support for Democrats over time in each cycle. Lastly, I regress the final estimate of support in each cycle against the number of seats Democrats won.

The Data

I have two primary sources of data: past polls and election results. The poll response that I use is the ‘generic Congressional ballot.’ Each pollster has a slightly different wording (and hence why we measure pollster bias), but they are all similar to: ‘If the elections for the U.S. House of Representatives were being held today, which party’s candidate would you vote for in your congressional district: The Democratic candidate or the Republcian candidate?’ The named Congressional ballot question would account for incumbency effects and more closely mirror the choice voters are making in the voting booth. However, since not all candidates are known for 2018 yet, this is the only current question being polled, and so for comparability, I will use the same question for past elections.

The past polls were taken from Real Clear Politics’ database across 6 election cycles: 2006, 2008, 2010, 2012, 2014 and 2016. Only polls where the year, date range, pollster, sampling universe and sample size are all known were included. Additionally, the polls’ results were transformed to reflect the two-way share for Democrats (Dem/(Dem+Rep)): it is a proportion between 0 and 1. Time is transformed to be the rounded number of weeks between the middle day of the poll and election day. A daily model would be more precise, but would take more data.

In total, 797 polls from 41 pollsters contacting 1.7m respondents over the 6 election cycles were used. These are the 5 largest pollsters. See Appendix B for full details.

For election results, I use both the popular vote share and the seats won. These were taken from Wikipedia: 2006, 2008, 2010, 2012, 2014, and 2016. Again, I use Democrats’ two-way vote share of the popular vote to mimic their two-way support in the polling data, and their percentage share of seats in the Congress.

First, let’s explore the trends over time in each cycle. Here, each point is a poll; it’s size relfects the sample size and color represents the pollster. The dashed line represents the final two-way popular vote share of Democrats. A couple of observations from this are clear. We see that by election, some pollsters are systematically off. For example, the pink pollster in 2010 was consistently below the final election result, suggesting bias. Last, we see that there are trends in results over time. For example, in 2014 the polls got closer and closer to the true result over time. Further investigation shows that poll results are not normally distributed around the result across time, suggesting we will need a time-dependent model.

Estimating pollster and universe bias

To estimate bias for each pollster and universe, I use a Bayesian random-walk models anchored to the true final election results. For the first cycle a pollster/universe is used in, its prior is normally distributed around 0pp and assumed to be less than 20pp 95% of the time, in either direction. This prior is updated to be the posterior from the most recent previous cycle the pollster/universe was active in. Full specification of the theoretical model can be found in Appendix A; implementation specifications and key convergance diagnostics can be found in Appendix B.

Below I plot the final bias estimate for each pollster. For example, for a pollster who polled in 2014 but not 2016, this will be their 2014 posterior results. Most pollsters are not biased by more than a percentage point in either direction. ‘Gallup Low-Turnout’ was the mostly conservative estimate (they took 4 polls in 1 election cycle). ‘AP/Ipsos’ most consistently overestimted Democratic support (they took 4 polls in 2 election cycles). POS (R) was the least biased pollseter with an average bias of -0.00025 across their 3 polls in 1 cycles. Full results can be found in Appendix B.

Looking more closely at pollsters that were active in at least 5 of the 6 cycles examine, we some variation in bias across cycles. For example, CBS/NYT strongly oveestimated Democratic support in 2008, but became less and less biased each cycle. Others were too conservative in some cycles and too liberal in others. Fox News underestimated Democratic support in all.

Additionally, we see that most sampling universes also show some overestimation of Democratic support. Our posterior observation from the 2016 cycle shows that likely voter universes across pollsters were biased 0.9pp in favor of Democrats, registered voter universes were biased 1.3pp and samples of just adults were biased nearly 4pp in favor of Democrats. Full results can be found in Appendix B.

These trends were fairly stable over time. The rank order of the universes was the same for all elections except 2006. Both adult and registered voter universes were stable around their final estimate since the 2010 cycle. In 2010, there was basically no bias in likely voter universes, but this increased in the following three elections.

Week-by-week estimates of support by cycle

Using the final estimates of bias for pollsters and universes as priors, I now refit the random-walk models, but with no anchor to the true result. This allows us to generate estimates week-by-week for each election, including a final estimate of election outcome, simulating a future prediction. The results are slightly overfit, especially for 2016, since the true results in each election updated the priors which are now inputs to the model. For 2016 specifically, the priors are derived from posterior distribution of the model anchored in the true result, so we should expect the model to be very precise. For full model specification see Appendix A and Appendix B for implementaton, code and full results.

The figure below highlights a few key trends. First, the estimated ‘true’ trendline for each election cycle is shifted slightly below the polls. This is due to the fact that our estimates of universe bias (above) were consistently overestimating Democratic support. Additional bias from polling house effects are split between over- and under-estimating Democratic support.

Second, while there appears to be significant variation over time in the polls, the trendlines are much smoother. In fact, week-over-week, the models estimate 95% of movement is less than 1.5pp. The model used here has one parameter for week-to-week movement, so it averages over big swings (see weeks 63-50 in 2014) and small, incremental changes (see weeks 100-63 in 2014). A model that separates out these trends might better identify which swings are real, substantive movements, and which should be smoothed.

Cycle

Forecast

Popular.Vote

Seat.Share

Forecast.Error

2006

57.2%

54.1%

53.6%

-3.1pp

2008

55.1%

55.5%

59.1%

0.5pp

2010

45.7%

46.5%

44.4%

0.8pp

2012

50.3%

50.6%

46.2%

0.3pp

2014

48.3%

47.1%

43.2%

-1.2pp

2016

50.9%

49.4%

44.6%

-1.5pp



Third, some cycles’ models are more accurate than others. This table shows the percentage point error of each final forecast. In 2008, 2010 and 2012, my models were very accurate. In 2014 and 2016, they missed by about a point, and in 2006, the forecast missed by 3pp. The error is not consistently biased in one direction, but instead sometimes overestimates and sometimes underestimates Democratic support.

The model that adapts for both past pollster and sampling universe bias is more accurate than a model that accounts for just pollster bias or no bias at all. Over the six cycles analyzed here, there is a reduction in error just over 0.3pp compared to the model with no bias and just under 0.3pp over the model that just accounts for pollster bias.

Lastly, we are interested in the relationship between the final forecasts and the actual share of seats won. Becuase of our winner-take-all and gerrymandered system, the popular vote rarely translates directly into share of seats won. A simple linear regression shows that 69% of the variation in seats won over the last 6 elections is explained by the forecasts produced here. Additionally, Democrats need to be forecasted to win about 52% of the popular vote to win 50% of the seats.

Conclusions

Using past public polling data and election results, I generated bias estimates for each pollster and sampling universe activly polling the generic Congressional ballot over the past six cycles. Generally, pollsters are consistently biased by less than 1pp, but a several still exceed this theshold. Of the pollsters analyzed, I find POS (R) and NBC/WSJ are the least biased pollsters. Likewise, I find that likely voter sampling universes are less biased than registered voter or adult universes. All three universes tend to overestimate the true level of Democratic support.

I then use these (admittedly overfit) estimates of bias to generate forecasts for the six cycles analyzed. The final forecasts had a mean absolute error of about 1.2pp, though five-of-six correctly predicted the winner. These forecasts are fairly predictive of the final seat share Democrats would win.

Appendix A

For a formal model, I follow Jackman (2005) to specify my model to estimate biases, but with an added term for sampling universe. A given poll is assumed to be normally distruted with support as the mean and the variance a function of \(y_i\) and sample size. This would be specified as: \[y_i \sim \mathcal{N}(\mu_i, \sigma^2_i)\] That poll is centered around mean \(\mu_i\), which itself is a function of \(\alpha_t\), the true value of support at the time the poll was taken \(t\), \(\delta_j\), the bias of pollster \(j\), and \(\theta_k\), the bias of sampling universe \(k\). Fully specified, this is: \[\mu_i = \alpha_{t_i} + \delta_{j_i} + \theta_{k_i}\] Due to the trends we see in our initial data exploration, a random walk model is appropriate. In such a model, support at time \(t\) is normally distributed around support at time \(t - 1\). \[ \alpha_t \sim \mathcal{N}(\alpha_{t-1}, \omega^2) \] By anchoring the model in the final election results, and by using a random walk, I will be able to estimate the consistent bias, \(\delta\), of each pollster and the effect, \(\theta\), of different sampling universes.

For these given specifications, we start with the following priors: \[ \sigma^2_i = \frac{y_i(1-y_i)}{n_i},\ \ \ \alpha_1 \sim \mathcal{U}(0.46, 0.56),\ \ \ \omega \sim \mathcal{U}(0, (0.02/1.96))\] \(\sigma^2_i\) just follows the formula for variance of a sample. As a prior for the starting true value of support (\(\alpha_1\)), I use a uniform distribution over the minimum and maximum actual vote share of Democrats in the six elections analyzed. Lastly, as a prior for the true variance of support (\(\omega\)), I use a uniform distribution between 0 and 0.01. A value of 0.01 would reflect that 95% of week-to-week movement is within about 2pp in either direction, a fairly weak assumption. These priors are similar to Strauss (2007)

\[\ \ \ \delta_j \sim \mathcal{N}(0, (0.2/1.96)^2),\ \ \ \theta_k \sim \mathcal{N}(0, (0.2/1.96)^2)\]

For pollster biases (\(\delta\)), I start with a prior that there is no bias with a standard deviation to reflect that bias is 95% of the time within 20pp in either direction; likewise, I start with a prior for bias from sampling universe (\(\theta\)) that is the same. However, these priors are updated based upon the previous election cycle. Thus, this prior is true for the fist cycle the pollster was active in (often 2006). Subsequently, the prior is the mean of the posterior observatons of (\(\delta_{j_{most\ recent\ cycle}}\)) and variance of the same posterior observations.

To fit the final forecasts, without anchoring in the election result, I use the final posterior distributions for each pollster and universe as priors. However, there needs to be some baseline to have a fully specified model. Whereas previously I used the election result as the baseline, in the forecasts, I use the pollster/universe with the smallest variance of the estimate. That is, I take as ground-truth our most certain estimate of bias and calculate the bias of other pollsters/universes relative to that.

Appendix B

Load packages, functions and other setup

library(ggplot2)
library(tidyverse)
library(rjags)
library(cowplot)
library(flextable)

source("forecasting_functions.R")

set.seed(102)
scipen=999

Load, prep and explore data

pollster_lkup <- read.csv("data/pollster_lkup.csv")

res <- read.csv("data/election_results.csv") %>%
  mutate(twoway_vote = dem_vote/(dem_vote+rep_vote),
         twoway_seat = dem_seats/(dem_seats+rep_seats)) %>%
  arrange(cycle)

polls <- read.csv("data/past_polls.csv") %>%
  mutate(twoway = dem/(dem+rep)) %>% 
  inner_join(res[,c("cycle","date")], by="cycle") %>%
  mutate(week = round(as.numeric((as.Date(as.character(date),  format="%m/%d/%y") - 
           as.Date(as.character(end_date),  format="%m/%d/%y")) + 
           (as.Date(as.character(end_date),  format="%m/%d/%y") - 
           as.Date(as.character(start_date),  format="%m/%d/%y"))/2)/7),
         n_size = as.numeric(as.character(n_size)))

polling_summary <- polls %>% 
  group_by(pollster) %>%
  summarise(`Total N-Size` = sum(n_size), 
            `# of Polls` = n(), 
            `# of Cycles` = length(unique(cycle))) %>%
  arrange(desc(`Total N-Size`)) %>%
  inner_join(pollster_lkup, by = "pollster") %>%
    mutate(pollster_raw = factor(pollster_raw, levels = pollster_raw[order(`Total N-Size`)]))

polling_summary_ft <- polling_summary %>% 
  mutate(nsize = as.character(`Total N-Size`),
         polls = `# of Polls`,
         cycles = `# of Cycles`) %>%
  select(pollster_raw, nsize, polls, cycles)

FT2 <- flextable(polling_summary_ft)
FT2 <- set_header_labels(FT2, pollster_raw = "Pollster", nsize = "Total N-Size", polls = "# of Polls", cycles = "# of Cycles")
FT2 <- theme_zebra(x = FT2, odd_header = "#CFCFCF", odd_body = "#EFEFEF",
even_header = "transparent", even_body = "transparent")
FT2 <- align(x = FT2, j = 1, align = "left", part = "all")
FT2 <- align(x = FT2, j = 2:4, align = "center", part = "all")
FT2 <- bold(x = FT2, bold = TRUE, part = "header")
FT2

Pollster

Total N-Size

# of Polls

# of Cycles

Rasmussen

1140483

318

6

Quinnipiac

61471

30

5

Gallup

56170

35

5

Democracy Corps (D)

40457

42

5

Fox News

40139

43

5

Pew

29423

20

5

Reuters/Ipsos

28448

26

3

PPP (D)

28410

32

4

CNN/ORC

26294

33

6

NBC/WSJ

21483

23

4

Politico/GWU/Battleground

18000

18

3

USA Today/Gallup

16739

17

4

GWU/Battleground

13800

14

4

Economist/YouGov

12686

10

1

CBS/NYT

12668

14

5

Bloomberg

10153

12

4

McClatchy/Marist

9376

11

4

ABC/WaPo

9226

11

5

Newsweek

8796

11

4

Gallup High-Turnout

7724

4

1

Gallup Low-Turnout

7724

4

1

Diageo/Hotline

5690

7

2

Time

5646

6

3

AP/GFK

5536

6

4

Ipsos/McClatchy

5504

6

1

NPR

5090

6

3

Battleground

5018

5

3

AP/Ipsos

4045

4

2

USA Today/Pew

3912

3

1

Cook/RT Strategies

3302

4

1

LAT/Bloomberg

3215

3

2

Resurgent Republic (R)

3000

3

1

POS (R)

2500

3

1

National Journal/FD

2316

2

1

Zogby

2013

2

1

McLaughlin (R)

2000

2

1

Hotline/FD

1428

3

1

Reason

1003

1

1

Harris

1001

1

1

Winston (R)

1000

1

1

USA Today/PSRAI

697

1

1

Estimate bias for pollsters and universes

two_sigma = 0.2
sigma2 = (two_sigma/1.96)^2

deltas <- data.frame(delta_cycle = 0,
                     delta_pollster = unique(polls$pollster),
                     delta_mu = rep(0, length(unique(polls$pollster))), 
                     delta_sigma2 = rep(sigma2, length(unique(polls$pollster))))
deltas_all <- deltas

thetas <- data.frame(theta_cycle = 0,
                     theta_univ = unique(polls$univ),
                     theta_mu = rep(0, length(unique(polls$univ))), 
                     theta_sigma2 = rep(sigma2, length(unique(polls$univ))))
thetas_all <- thetas

convergence <- list()

#Estimation
for(cycle in res$cycle) {
  data_jags <- data_prep(data = polls, res = res, year = cycle, anchor = T)
  data_jags <- bias_priors(data_jags = data_jags, deltas = deltas, thetas = thetas, anchor = T)
  convergence[[paste(cycle)]] <- convergence_diagnostics(data_jags = data_jags,
                                                         anchor = T,
                                                         chains = 4, 
                                                         thining = 10, 
                                                         burnin = 10000, 
                                                         iter = 1000000)
  mod_res <- run_model(data_jags = data_jags,
                       anchor = T,
                       chains = 4, 
                       thining = 10, 
                       burnin = 10000, 
                       iter = 1000000, 
                       params = c("delta", "theta"))

  prior_ests <- calculate_priors(mod_res = mod_res, year = cycle, data_jags = data_jags, anchor = T)
  new_priors <- update_priors(deltas_all = deltas_all, thetas_all = thetas_all, 
                              deltas_new = prior_ests$deltas_est, thetas_new = prior_ests$thetas_est)
  deltas <- new_priors$deltas 
  deltas_all <- new_priors$deltas_all 

  thetas <- new_priors$thetas 
  thetas_all <- new_priors$thetas_all
}

deltas <- deltas %>% 
    arrange(delta_mu) %>%
    inner_join(pollster_lkup, by = c("delta_pollster" = "pollster")) %>%
    mutate(pollster_raw = factor(pollster_raw, levels = pollster_raw[order(delta_mu)]))

deltas_all <- deltas_all %>%
    inner_join(pollster_lkup, by = c("delta_pollster" = "pollster"))

thetas <- thetas %>% 
    arrange(theta_mu) %>%
    mutate(theta_univ = factor(theta_univ, levels = theta_univ[order(theta_mu)]))
## Convergance diagnostics for sample 2006 parameters:
## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## delta[10]          1          1
## theta[3]           1          1
## xi[9]              1          1
## 
## Multivariate psrf
## 
## 1
##          delta[10]    theta[3]        xi[9]
## Lag 0   1.00000000 1.000000000 1.0000000000
## Lag 10  0.54614410 0.842414407 0.6479339022
## Lag 50  0.35746885 0.570889504 0.2770018344
## Lag 100 0.21266196 0.346365247 0.0977201984
## Lag 500 0.00476482 0.008362083 0.0003705629
## Convergance diagnostics for sample 2008 parameters:
## Potential scale reduction factors:
## 
##          Point est. Upper C.I.
## delta[2]          1          1
## theta[2]          1          1
## xi[16]            1          1
## 
## Multivariate psrf
## 
## 1
##              delta[2]    theta[2]      xi[16]
## Lag 0    1.0000000000 1.000000000 1.000000000
## Lag 10   0.2008663658 0.810591275 0.546339430
## Lag 50   0.0523200172 0.352206104 0.281306715
## Lag 100  0.0051833456 0.126195478 0.118530308
## Lag 500 -0.0002175812 0.002406398 0.003007152
## Convergance diagnostics for sample 2010 parameters:
## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## delta[10]          1          1
## theta[3]           1          1
## xi[96]             1          1
## 
## Multivariate psrf
## 
## 1
##            delta[10]   theta[3]      xi[96]
## Lag 0    1.000000000 1.00000000 1.000000000
## Lag 10   0.191602780 0.70567278 0.497069151
## Lag 50   0.047676821 0.45326537 0.356198091
## Lag 100  0.010272039 0.29748969 0.240206424
## Lag 500 -0.002468651 0.01169564 0.006316747
## Convergance diagnostics for sample 2012 parameters:
## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## delta[15]          1          1
## theta[3]           1          1
## xi[45]             1          1
## 
## Multivariate psrf
## 
## 1
##             delta[15]      theta[3]      xi[45]
## Lag 0    1.0000000000  1.0000000000 1.000000000
## Lag 10   0.0233816404  0.3486880276 0.266722290
## Lag 50   0.0081948753  0.1401180564 0.106308451
## Lag 100  0.0048805013  0.0415554326 0.035292227
## Lag 500 -0.0002641986 -0.0001903756 0.001612574
## Convergance diagnostics for sample 2014 parameters:
## Potential scale reduction factors:
## 
##          Point est. Upper C.I.
## delta[9]          1          1
## theta[2]          1          1
## xi[86]            1          1
## 
## Multivariate psrf
## 
## 1
##              delta[9]     theta[2]        xi[86]
## Lag 0    1.0000000000  1.000000000  1.000000e+00
## Lag 10   0.0114703204  0.551802331  2.188369e-01
## Lag 50   0.0006921544  0.119901657  5.559910e-02
## Lag 100 -0.0008374459  0.021559699  8.840482e-03
## Lag 500 -0.0001438989 -0.001279462 -7.920388e-05
## Convergance diagnostics for sample 2016 parameters:
## Potential scale reduction factors:
## 
##           Point est. Upper C.I.
## delta[10]          1          1
## theta[2]           1          1
## xi[39]             1          1
## 
## Multivariate psrf
## 
## 1
##             delta[10]     theta[2]        xi[39]
## Lag 0    1.000000e+00  1.000000000  1.0000000000
## Lag 10   4.359223e-02  0.099004583  0.1572074203
## Lag 50   3.544292e-03  0.010580444  0.0134616524
## Lag 100  5.696531e-04  0.003168989  0.0008130858
## Lag 500 -8.839755e-05 -0.001716413 -0.0017817737
## Final estimates of pollster bias:

Pollster

Cycle

Bias

Variance

Gallup Low-Turnout

2010

-0.060

0.000

Gallup High-Turnout

2010

-0.033

0.000

Resurgent Republic (R)

2012

-0.030

0.000

Fox News

2016

-0.017

0.000

NPR

2014

-0.015

0.000

Reason

2014

-0.015

0.000

USA Today/Gallup

2012

-0.013

0.000

Reuters/Ipsos

2016

-0.013

0.000

McLaughlin (R)

2010

-0.012

0.000

Gallup

2014

-0.012

0.000

Winston (R)

2010

-0.011

0.000

CNN/ORC

2016

-0.011

0.000

AP/GFK

2016

-0.010

0.000

Bloomberg

2016

-0.009

0.000

ABC/WaPo

2016

-0.008

0.000

Politico/GWU/Battleground

2014

-0.007

0.000

LAT/Bloomberg

2008

-0.006

0.000

Quinnipiac

2016

-0.005

0.000

GWU/Battleground

2016

-0.004

0.000

Zogby

2006

-0.004

0.001

Pew

2014

-0.003

0.000

McClatchy/Marist

2016

-0.003

0.000

Time

2010

-0.003

0.000

Democracy Corps (D)

2014

-0.003

0.000

PPP (D)

2016

-0.002

0.000

Battleground

2010

-0.002

0.000

Rasmussen

2016

-0.002

0.000

NBC/WSJ

2016

-0.001

0.000

POS (R)

2010

-0.000

0.000

Hotline/FD

2006

0.004

0.001

CBS/NYT

2016

0.004

0.000

Ipsos/McClatchy

2010

0.004

0.000

Economist/YouGov

2016

0.007

0.000

Harris

2006

0.007

0.001

Diageo/Hotline

2010

0.009

0.000

USA Today/Pew

2014

0.010

0.000

National Journal/FD

2010

0.011

0.000

Newsweek

2012

0.014

0.000

USA Today/PSRAI

2014

0.017

0.000

Cook/RT Strategies

2006

0.022

0.001

AP/Ipsos

2008

0.033

0.000

## Estimate for each pollster and cycle:
## (If a pollster is missing from a cycle, it did not poll).

Pollster

Cycle

Bias

Variance

ABC/WaPo

2006

-0.008

0.001

AP/Ipsos

2006

0.018

0.001

Battleground

2006

-0.016

0.001

CBS/NYT

2006

0.034

0.001

CNN/ORC

2006

0.007

0.001

Cook/RT Strategies

2006

0.022

0.001

Democracy Corps (D)

2006

-0.007

0.001

Fox News

2006

-0.003

0.001

Gallup

2006

0.013

0.001

Harris

2006

0.007

0.001

Hotline/FD

2006

0.004

0.001

LAT/Bloomberg

2006

0.004

0.001

NBC/WSJ

2006

0.005

0.001

Newsweek

2006

0.014

0.001

Pew

2006

0.006

0.001

Quinnipiac

2006

0.002

0.001

Rasmussen

2006

0.001

0.001

Time

2006

0.014

0.001

USA Today/Gallup

2006

-0.024

0.001

Zogby

2006

-0.004

0.001

ABC/WaPo

2008

-0.013

0.000

AP/GFK

2008

0.003

0.000

AP/Ipsos

2008

0.033

0.000

Battleground

2008

-0.008

0.000

CBS/NYT

2008

0.039

0.000

CNN/ORC

2008

-0.006

0.000

Democracy Corps (D)

2008

-0.016

0.000

Diageo/Hotline

2008

-0.024

0.000

Fox News

2008

-0.014

0.000

Gallup

2008

0.009

0.000

GWU/Battleground

2008

-0.009

0.000

LAT/Bloomberg

2008

-0.006

0.000

NBC/WSJ

2008

0.013

0.000

Newsweek

2008

0.003

0.000

Pew

2008

0.007

0.000

Rasmussen

2008

-0.001

0.000

Time

2008

0.005

0.000

USA Today/Gallup

2008

-0.032

0.000

ABC/WaPo

2010

0.007

0.000

AP/GFK

2010

-0.004

0.000

Battleground

2010

-0.002

0.000

Bloomberg

2010

0.007

0.000

CNN/ORC

2010

-0.009

0.000

Democracy Corps (D)

2010

0.001

0.000

Diageo/Hotline

2010

0.009

0.000

Fox News

2010

-0.027

0.000

Gallup

2010

-0.011

0.000

Gallup High-Turnout

2010

-0.033

0.000

Gallup Low-Turnout

2010

-0.060

0.000

GWU/Battleground

2010

0.004

0.000

Ipsos/McClatchy

2010

0.004

0.000

McClatchy/Marist

2010

-0.006

0.000

McLaughlin (R)

2010

-0.012

0.000

National Journal/FD

2010

0.011

0.000

Newsweek

2010

0.016

0.000

NPR

2010

-0.014

0.000

Pew

2010

0.002

0.000

Politico/GWU/Battleground

2010

0.010

0.000

POS (R)

2010

-0.000

0.000

PPP (D)

2010

-0.005

0.000

Quinnipiac

2010

-0.005

0.000

Rasmussen

2010

-0.024

0.000

Reuters/Ipsos

2010

0.001

0.000

Time

2010

-0.003

0.000

USA Today/Gallup

2010

-0.014

0.000

Winston (R)

2010

-0.011

0.000

Bloomberg

2012

-0.007

0.000

CBS/NYT

2012

0.028

0.000

CNN/ORC

2012

-0.011

0.000

Democracy Corps (D)

2012

-0.001

0.000

Gallup

2012

-0.012

0.000

McClatchy/Marist

2012

-0.014

0.000

Newsweek

2012

0.014

0.000

NPR

2012

-0.017

0.000

Pew

2012

-0.003

0.000

Politico/GWU/Battleground

2012

-0.005

0.000

PPP (D)

2012

0.003

0.000

Quinnipiac

2012

-0.004

0.000

Rasmussen

2012

-0.009

0.000

Resurgent Republic (R)

2012

-0.030

0.000

Reuters/Ipsos

2012

-0.020

0.000

USA Today/Gallup

2012

-0.013

0.000

ABC/WaPo

2014

-0.004

0.000

AP/GFK

2014

-0.015

0.000

Bloomberg

2014

-0.008

0.000

CBS/NYT

2014

0.008

0.000

CNN/ORC

2014

-0.010

0.000

Democracy Corps (D)

2014

-0.003

0.000

Fox News

2014

-0.020

0.000

Gallup

2014

-0.012

0.000

GWU/Battleground

2014

-0.007

0.000

McClatchy/Marist

2014

-0.007

0.000

NBC/WSJ

2014

0.011

0.000

NPR

2014

-0.015

0.000

Pew

2014

-0.003

0.000

Politico/GWU/Battleground

2014

-0.007

0.000

PPP (D)

2014

-0.002

0.000

Quinnipiac

2014

-0.004

0.000

Rasmussen

2014

-0.001

0.000

Reason

2014

-0.015

0.000

USA Today/Pew

2014

0.010

0.000

USA Today/PSRAI

2014

0.017

0.000

ABC/WaPo

2016

-0.008

0.000

AP/GFK

2016

-0.010

0.000

Bloomberg

2016

-0.009

0.000

CBS/NYT

2016

0.004

0.000

CNN/ORC

2016

-0.011

0.000

Economist/YouGov

2016

0.007

0.000

Fox News

2016

-0.017

0.000

GWU/Battleground

2016

-0.004

0.000

McClatchy/Marist

2016

-0.003

0.000

NBC/WSJ

2016

-0.001

0.000

PPP (D)

2016

-0.002

0.000

Quinnipiac

2016

-0.005

0.000

Rasmussen

2016

-0.002

0.000

Reuters/Ipsos

2016

-0.013

0.000

## Final estimates of sampling universe bias:

Sampling Universe

Cycle

Bias

Variance

LV

2016

0.009

0

RV

2016

0.013

0

Adults

2016

0.039

0

## Estimate for each universe and cycle:

theta_univ

theta_cycle

Bias

Variance

Adults

2006

0.030

6e-04

LV

2006

0.033

5e-04

RV

2006

0.031

6e-04

Adults

2008

0.026

4e-04

LV

2008

0.009

1e-04

RV

2008

0.015

1e-04

Adults

2010

0.043

1e-04

LV

2010

-0.002

0

RV

2010

0.026

0

Adults

2012

0.037

1e-04

LV

2012

0.004

0

RV

2012

0.015

0

Adults

2014

0.037

1e-04

LV

2014

0.007

0

RV

2014

0.015

0

Adults

2016

0.039

0

LV

2016

0.009

0

RV

2016

0.013

0

Estimate week-by-week movement using past pollster and universe biases

all_cycle_est <- data.frame(iter_mean = numeric(0),
                            iter_sigma2 = numeric(0),
                            time_before_elec = numeric(0),
                            upper_bound = numeric(0),
                            lower_bound = numeric(0),
                            cycle = numeric(0))

omegas <- c()

for(cycle in res$cycle) {
  data_jags <- data_prep(data = polls, res = res, year = cycle, anchor = F)
  data_jags <- bias_priors(data_jags = data_jags, deltas = deltas, thetas = thetas, anchor = F)

  mod_res <- run_model(data_jags = data_jags, 
                       anchor = F,
                       params = c("xi", "omega"), 
                       chains = 4, 
                       thining = 10, 
                       burnin = 10000, 
                       iter = 1000000)
  cycle_time_est <- extract_time_est(mod_res = mod_res, year = cycle, data_jags = data_jags)
  all_cycle_est <- rbind(all_cycle_est, cycle_time_est)
  omegas <- c(omegas, paste(extract_omega_est(mod_res = mod_res, year = cycle, data_jags = data_jags)))
}

Estimate relationship between election forecast and seat share won

lm_obj <- lm(twoway_seat ~ iter_mean, data = all_cycle_est %>% 
  filter(time_before_elec == 0) %>%
  inner_join(avgs, by = "cycle"))
summary(lm_obj)
## 
## Call:
## lm(formula = twoway_seat ~ iter_mean, data = all_cycle_est %>% 
##     filter(time_before_elec == 0) %>% inner_join(avgs, by = "cycle"))
## 
## Residuals:
##        1        2        3        4        5        6 
## -0.02343  0.05851  0.02750 -0.01126 -0.01618 -0.03514 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -0.1496     0.2147  -0.697   0.5245  
## iter_mean     1.2382     0.4177   2.964   0.0414 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03985 on 4 degrees of freedom
## Multiple R-squared:  0.6871, Adjusted R-squared:  0.6089 
## F-statistic: 8.785 on 1 and 4 DF,  p-value: 0.04139